Needletail
Needletail is a MIT-licensed, minimal-copying FASTA/FASTQ parser and k-mer processing library for Rust.
The goal is to write a fast and well-tested set of functions that more specialized bioinformatics programs can use. Needletail's goal is to be as fast as the readfq C library at parsing FASTX files and much (i.e. 25 times) faster than equivalent Python implementations at k-mer counting.
Example
extern crate needletail;
use ;
Installation
Needletail requires rust
and cargo
to be installed.
Please use either your local package manager (homebrew
, apt-get
, pacman
, etc) or install these via rustup.
Once you have Rust set up, you can include needletail in your Cargo.toml
file like:
[dependencies]
needletail = "0.4"
To install needletail itself for development:
git clone https://github.com/onecodex/needletail
cargo test # to run tests
Python
Documentation
For a real example, you can refer to test_python.py
.
The python library only raise one type of exception: NeedletailError
.
There are 2 ways to parse a FASTA/FASTQ: one if you have a string (parse_fastx_string(content: str)
) or a path to a file
(parse_fastx_file(path: str)
). Those functions will raise if the file is not found or if the content is invalid and will return
an iterator.
A record has the following shape:
:
:
:
Note that normalize
(see https://docs.rs/needletail/0.4.1/needletail/sequence/fn.normalize.html for what it does) will mutate self.seq
.
It is also available as the normalize_seq(seq: str, iupac: bool)
function which will return the normalized sequence in this case.
Lastly, there is also a reverse_complement(seq: str)
that will do exactly what it says. This will not raise an error if you pass some invalid
characters.
Building
To work on the Python library on a Mac OS X/Unix system (requires Python 3):
# finally, install the library in the local virtualenv
To build the binary wheels and push to PyPI
# The Mac build requires switching through a few different python versions
maturin build --features python --release --strip
# The linux build is automated through cross-compiling in a docker image
docker run --rm -v $(pwd):/io ghcr.io/pyo3/maturin:main build --features=python --release --strip -f
twine upload target/wheels/*
Getting Help
Questions are best directed as GitHub issues. We plan to add more documentation soon, but in the meantime "doc" comments are included in the source.
Contributing
Please do! We're happy to discuss possible additions and/or accept pull requests.
Acknowledgements
Starting from 0.4, the parsers algorithms is taken from seq_io. While it has been slightly modified, it is mainly
coming from that library. Links to the original files are available in src/parser/fast{a,q}.rs
.